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Abstract 

Expected Shortfall (ES) in several variants has been proposed as remedy for the defi- 
ciencies of Value- at- Risk (VaR) which in general is not a coherent risk measure. In fact, 
most definitions of ES lead to the same results when applied to continuous loss distributions. 
Differences may appear when the underlying loss distributions have discontinuities. In this 
case even the coherence property of ES can get lost unless one took care of the details in 
its definition. We compare some of the definitions of Expected Shortfall, pointing out that 
there is one which is robust in the sense of yielding a coherent risk measure regardless of 
the underlying distributions. Moreover, this Expected Shortfall can be estimated effectively 
even in cases where the usual estimators for VaR fail. 

Key WORDS: Expected Shortfall; Risk measure; worst conditional expectation; tail con- 
ditional expectation; value-at-risk (VaR); conditional value-at-risk (CVaR); tail mean; co- 
herence; quantile; sub-additivity. 

1 Introduction 

Value-at-Risk (VaR) as a risk measure is heavily criticized for not being sub-additive (see |7]] 
for an overview of the criticism). This means that the risk of a portfolio can be larger than the 
sum of the stand-alone risks of its components when measured by VaR (cf. 0, [||, or O). 
Hence, managing risk by VaR may fail to stimulate diversification. Moreover, VaR does not take 
into account the severity of an incurred damage event. 

As a response to these deficiencies the notion of coherent risk measures was introduced in 

H], and ||. An important example for a risk measure of this kind is the worst conditional 

expectation (WCE) (cf. Definition 5.2 in This notion is closely related to the tail conditional 

expectation (TCE) from Definition 5.1 in Q, but in general does not coincide with it (see section 

H below). Unfortunately, a somewhat misleading formulation in suggests this coincidence to 
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be true. Meanwhile, several authors (e.g. [flTf] , [13], or Q) proposed modifications to TCE, this 
way increasing confusion since the relation of these modifications to TCE and WCE remained 
obscure to a certain degree. 

The identification of TCE and WCE is to a certain degree a temptation though the authors of 
U actually did their best to warn the reader. WCE is in fact coherent but very useful only in 
a theoretical setting since it requires the knowledge of the whole underlying probability space 
while TCE lends itself naturally to practical applications but it is not coherent (see Example 
below). The goal to construct a risk measure which is both coherent and easy to compute and 
to estimate was however achieved in Q. The definition of Expected Shortfall (ES) at a specified 
level a in [ffl] (Definition ^1] below) is the literal mathematical transcription of the concept 
"average loss in the worst 100a% cases" . We rely on this definition of Expected Shortfall in the 
present paper, despite the fact that in the literature this term was already used sometimes in 
another meaning. 

With the paper at hand we strive primarily for making transparent the relations between the 
notions developed in ||, ||, |13| , and [Q. We present four characterizations of Expected shortfall: 
as integral of all the quantiles below the corresponding level (eq. fl3.3|)), as limit in a tail strong 



law of large numbers (Proposition 4.1), as minimum of a certain functional introduced in |T^] 
(Corollary |4.3| below), and as maximum of WCEs when the underlying probability space varies 
(Corollary |6.3| ). This way, we will show that the ES definition in Q is complementary and even 
in some aspects superior to the other notions. Moreover, in a certain sense any law invariant 
coherent risk measure has a representation with ES as the main building block (see ]IT|). 



Some hints on the organization of the paper: 

In section |2] we give precise mathematical definitions to the five notions to be discussed. These 
are WCE, TCE, CVaR (conditional value- at-risk), ES, and its negative, the so-called a-tail 
mean (TM). Section || presents useful properties of a-tail mean and ES, namely the integral 
representation ([T^), continuity and monotonicity in the level a as well as coherence for ES. In 
section || we show first that a-tail mean arises naturally as limit of the average of the 100a% 
worst cases in a sample. Then we point out that in fact ES and CVaR are two different names 
for the same object. Section || is devoted to inequalities and examples clarifying the relations 
between ES, TCE, and WCE. In Section || we deal with the question how to state a general 
representation of ES in terms of WCE. Section [?] concludes the paper. 



2 Basic definitions 

We have to arrange a minimum set of definitions to be consistent with the notions used in ||, 
| |13| |, and [pj. Fix for this section some real- valued random variable X on a probability space 
(£1,A, P). X is considered the random profit or loss of some asset or portfolio. For the purpose 
of this paper, we are mainly interested in losses, i.e. low values of X. By E[. . . ] we will denote 



2 



expectation with respect to P. Fix also some confidence level a G (0, 1). We will often make use 
of the indicator function 

(2.1) l A (a) = 1a = 




□ 



Definition 2.1 (Quantiles) 

x (a) = Qa(X) = inf{x G R : P[X < x] > a} is the lower a-quantile of X, 
aj(°0 = = inf{x G R : P[X < x] > a} is the upper a-quantile of X. 

We use the x-notation if the dependence on X is evident, otherwise the q-notion. 

Note that x^ = sup{x G R : P[X < x] < a}. From {x G R : P[X < x] > a} C {x G R : 
P[X < x] > a} it is clear that xi a \ < x^-*. Moreover, it is easy to see that 

(2.2) X(q,) = x^ if and only if P[X < x] = a for at most one x , 

and in case < x^ 

,2,) {ieR : ^P[X^, } _ { l***^). ^^;:j > " 

[ [x (q) , iW] , P[X = i^j = 0. 

(|2.2| ) and ( ^.3[) explain why it is difficult to say that there is an obvious definition for value-at- 
risk (VaR). We join here || taking as VaR Q the smallest value such that the probability of the 
absolute loss being at most this value is at least 1 — a. As this is not really comprehensible when 
said with words here is the formal definition: 

Definition 2.2 (Value-at-risk) 

VaR a = VaR a (X) = -x^ = qi- a (-X) is the value-at-risk at level a of X . □ 

The definition of tail conditional expectation (TCE) given in [||], Definition 5.1, depends on 
the choice of quantile taken for VaR (and of some discount factor we neglect here for reasons 
of simplicity). But as there is a choice for VaR there is also a choice for TCE. That is why 
we consider a lower and an upper TCE. Denote the positive part of a number x by x + = 

]x,x>0 . , 

< and its negative part by x = {—xy. 

\ 0, x < 0, V ; 

Definition 2.3 (Tail conditional expectations) 

Assume ~E[X~] < oo. Then TCE Q = TCE a (X) = — E[X | X < x^ a -j] is the lower tail conditional 
expectation at level a of X. 

TCE Q = TCE a (V) = -E[X | X < x^} is the upper tail conditional expectation at level a of 
X. □ 



3 



TCE a is (up to a discount factor) the tail conditional expectation from Definition 5.1 in ||. 
"Lower" and "upper" here corresponds to the quantiles used for the definitions, but not to the 
proportion of the quantities. In fact, 

(2.4) TCE Q > TCE° 

is obvious. 

As Delbaen says in the proof of Theorem 6.10 in ||, TCE a in general does not define a sub- 
additive risk measure (see Example |5,4| below). For this reason, in Q, Definition 5.2, the worst 
conditional expectation (WCE) was introduced. Here is the definition (up to a discount factor) 
in our terms: 

Definition 2.4 (Worst conditional expectation) 

Assume ELY"] < cx>. Then WCE Q = WCE Q (X) = — inf{ELY | A] : A G A,P[A] > a} is the 
worst conditional expectation at level a of X. □ 

Observe that under the assumption E[X _ ] < oo the value of WCE Q is always finite since 
then lim^oo P[X < + t] = 1 implies that there is some event A = {X < x^ + t} with 
PL4] > a and E[ \X\ 1a] < oo. We will see in section || that Definition |2.4| has to be treated 
with care nevertheless because the notion WCE Q (X) hides the fact that it depends not only on 
the distribution of X but also on the structure of the underlying probability space. From the 
definition it is clear that for any random variables X and Y on the same probability space 

WCE a (X + Y) < WCE a (X) + WCE a (Y), 

i.e. WCE is sub-additive. Moreover, Proposition 5.1 in [| says WCE Q > TCE a . Hence WCE a is 
a majorant to TCE Q > VaR a . It is in fact the smallest coherent risk measure dominating VaR a 
and only depending on X through its distribution if the underlying probability space is "rich" 
enough (see Theorem 6.10 in || for details). 

This is a nice result, but to a certain degree unsatisfactory since the infimum does not seem too 
handy. This observation might have been the reason for introducing the conditional value-at-risk 
(CVaR) in [jl7| (see also the references therein) and [13]. CVaR can be used as a base for very 



efficient optimization procedures. We quote here, up to the sign of the random variable and the 



corresponding change from a to 1 — a (cf. Definition |2.2| ), equation (1.2) from 13]. 
Definition 2.5 (Conditional value-at-risk) 

Assume E[X~] < oo. Then CVaR" = CVaR Q (X) = inf { E ^ x ~ s ^ - s : s G m} is the condi- 
tional value-at-risk at level a of X. □ 



Note that by Proposition |4.2| and ( [4.9|) , CVaR is well-defined. But beware: Pflug states in 
equation (1.3) of Jl^] (translated to our setting, i.e. —X instead of Y and 1 — a instead of a) 
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the relation CVaR Q (X) = TCE Q (X) , without any assumption. Corollary [k3 in connection with 



Corollary £D| shows that this is only true if P[X < x {a) ) = 0, P[X = = or P[X < x {a) ] > 



0, P[X < X( a j] = a (in particular if the distribution of X is continuous). 

The last definition we need is that of a-tail mean from |fl]. In order to make it comparable to 
the risk measures defined so far, we define it in two variants: the tail mean which is likely to 



be negative but appears in a statistical context (cf. Proposition 41 below), and the Expected 
Shortfall representing potential loss as in most cases positive number. The advantage of tail mean 
is the explicit representation allowing an easy proof of super-additivity (hence sub-additivity for 
its negative) independent of the distributions of the underlying random variables (cf. the theorem 
in the appendix of (l|). We will see below (Corollary [4.3| ) that the Expected Shortfall is in fact 
identical with CVaR and enjoys properties as coherence and continuity and monotonicity in the 
confidence level (section ||). Moreover, it is in a specific sense the largest possible value WCE 
can take (Corollary 



Definition 2.6 (Tail mean and Expected Shortfall) 

Assume E[X~] < oo. Then 

X(a) = TM a (X) = a _1 (E[X l{x<z (a) }] + x (a) ( a ~ ~P[X < £( a )])^ is the a-tail mean at level a 
ofX. 

ES a = ES Q (X) = — xr a \ is the Expected Shortfall (ES) at level a of X . □ 



Note that by Corollary 4.3 a-tail mean and ES a only depend on the distribution of X and the 



level a but not on a particular definition of quantile. 

3 Useful properties of tail mean and Expected Shortfall 

The most important property of ES (Definition 2.6) might be its coherence. 



Proposition 3.1 (Coherence of ES) Let a G (0, 1) be fixed. Consider a set V of real-valued 
random variables on some probability space (O, A, P) such that ~E[X~] < oo for all X £ V . Then 
p : V — > R with p(X) = ES a (X) for X £ V is a coherent risk measure in the sense of Definition 
2.1 in i.e. it is 

(i) monotonous: X £ V, X > p(X) < 0, 

(ii) sub-additive: X, Y, X + Y £ V p(X + Y) < p(X) + p{Y), 

(Hi) positively homogeneous: X £ V, h > 0,hX £ V p(hX) = hp(X), and 

(iv) translation invariant: X £ V, a £ M p(X + a) = p(X) — a. 
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Proof. See Proposition |A.l| in the Appendix for an elementary proof of (ii). To check (i), (iii) 
and (iv) is an easy exercise (cf . also Proposition |3.2|) . □ 



In the financial industry there is a growing necessity to deal with random variables with discon- 
tinuous distributions. Examples are portfolios of not-traded loans (purely discrete distributions) 
or portfolios containing derivatives (mixtures of continuous and discrete distributions). One 
problem with tail risk measures like VaR, TCE, and WCE, when applied to discontinuous dis- 
tributions, may be their sensitivity to small changes in the confidence level a. In other words, 
they are not in general continuous with respect to the confidence level a (see Example j5.4|) . 

In contrast, ES Q is continuous with respect to a. Hence, regardless of the underlying distribu- 
tions, one can be sure that the risk measured by ES a will not change dramatically when there 
is a switch in the confidence level by - say - some base points. We are going to derive this 
insensitivity property in Corollary |3.3| below as a consequence of an alternative representation of 
tail mean. This integral representation (Proposition 3.2) - which was already given in |3j for the 



case of continuous distributions - might be of interest on its own. Another - almost self-evident 
- important property of ES a is its monotonicity in a. The smaller the level a the greater is the 



risk. We show this formally in Proposition "i.4. 



Proposition 3.2 If X is a real-valued random variable on a probability space A, P) with 
E[X~] < oo and a € (0, 1) is fixed, then 



"(a) = a 1 / x {u) du. 



a 



with xr a \ and as in Definitions \2.j\ and |g, q , respectively. 



Proof. By switching to another probability space if necessary, we can assume that there is a 
real random variable U on (Q,A, P) that is uniformly distributed on (0, 1), i.e. P[C7 < u] = u, 
u € (0, 1). It is well-known that then the random variable Z = xm\ has the same distribution 
as X. 

Since u i— > Xt u ) is non-decreasing we have 

(3.1) {U<a} C {Z<x {a) } and 

(3.2) {U>a}n{Z<x (a) } C {Z = x {a) }. 

By (JO]) and (|^) we obtain 

pa. 

I x {u) du = E[Zl {u < a} ] 
Jo 

= E[Z l{z<x (a) }] - E[2" l{c/>Q}n{z<x (Q) }] 

= ELY l {x < X(a) }} + x (a) (a - P[X < s (a) ]) . 

Dividing by a now yields the assertion. □ 
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Note that by definition of Expected Shortfall, Proposition |3.2| implies the representation 



(3.3) 



ES Q (X) 



a 



q u {X)du. 



Eq. ( |3.3[ ) shows that ES is the coherent risk measure used in [11] as main building block for the 
representation of law invariant coherent risk measures. 



Corollary 3.3 If X is a real-valued random variable with E[X ] < oo, then the mappings 
a i— ► x a and a t— > ES a are continuous on (0, 1). 



Proof. Immediate from Proposition |3.2| and 13 



□ 

For some of the results below and in particular the subsequent proposition on monotonicity of 
the tail mean and ES, a further representation for xi a \ is useful (cf. Appendix in Q). Let for 

I6l 



(3.4) 



{X<x} 



l {X<x} 



1 {X < X} ,if P[X = x] = 



Then a short calculation shows 

(3.5) 

(3.6) 



E 



(3.7) 



a _1 E 



I (a) 

L {^<^ (a) } 
(a) 



a , and 

Z( Q ) . 



Proposition 3.4 /jf X is a real-valued random variable with ~E[X ] < oo, then for any a £ (0, 1) 
and any e > with a + e < 1 we /ioue i/ie following inequalities: 

%(a+t) > #(a) and 
ES Q+e (X) < ES a (X). 



Proof. We adopt the representation (|3.7|). This yields 



E 



= (a(a + e))" 1 E 
> (a(oi + e)) _1 E 



X ol 



(a+e) 

{X<x (q+s )} 
(a+e) 



{X<x (a) } 

(a + e) 1 



(a) 

{^<z (a )} 
(a) 



ajr„i I a 1} V J — (a + e) , 



(a) 



afa + e) 



L (a+e) 

{X<a; (a+e) } 



- (a + e)E 



L (a) 

{X<x (q) } 



'(«) 



a(a + e) 
0. 



(a (a + e) — (a + e) 



a 
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The inequality is due to the fact that by 



(a+e) / i \ -i ( a ) J < , if X < x {a) 



al {X<x {a+e) } l« + e J L {X<x {ayi 



□ 

> , if X > . 



4 Motivation for tail mean and Expected Shortfall 

Assume that we want to estimate the lower a-quantile x^ of some random variable X. Let 
some sample (Xi, . . . ,X n ), drawn from independent copies of X, be given. Denote by X% :n < 
■ ■ ■ < X n:n the components of the ordered n-tuple (X\, . . . , X n ). Denote by \x\ the integer part 
of the number x € R, hence 

\x\ = max{n £ Z : n < x} . 

Then the order statistic X\ na i. n appears as natural estimator for xr a y Nevertheless, it is well 
known that in case of a non-unique quantile (i.e. x/ a \ < x^) the quantity X^ na y n does not 
converge to Xt a y This follows for instance from Theorem 1 in || which says that 

1 = P[X|^ nQ ,j. n < infinitely often] = P[A^ na j :n > x^ a ' infinitely often] . 

Surprisingly, we get a well-determined limit when we replace the single order statistic by an 
average over the left tail of the sample. Recall the definition (|2,1D of an indicator function. 



Proposition 4.1 Let a £ (0,1) be fixed, X a real random variable with K[X~] < oo and 
(Xi, X2, ■ ■ ■) an independent sequence of random variables with the same distribution as X. 
Then with probability 1 

\na\ 

y~] x>i :n 

(4.1) lim — j— = x (a) . 

n— »oo [net] ' 

If X is integrable, then the convergence in (U.iL) holds in L 1; too. 



Proof. Due to Proposition 3.2, the "with probability 1" part of Proposition |4.1| is essentially 
a special case of Theorem 3.1 in |l]| with = to < a = ti < t2 = I, J(t) = l(o,a](*)> 
<7n(£) = l/ n LW"J +! ](£); g(t) = F' 1 ^), and p\ = P2 = 00. Concerning the Li -convergence note 
that 

\na\ n 
i=l i=l 

By the strong law of large numbers n~ l Y2i=i l-^il converges in Li. This implies uniform inte- 
grability for n~ 1 Y27=l an d for w — ^^j-™" -^i:n|> Together with the already proven almost 
sure convergence this implies the assertion. □ 



S 



To see how a direct proof of the almost sure convergence in Proposition iA would work consider 
the following heuristic computation. Observe first that 

\na\ 

Yl X i:n 
i=l 



/mj = \nc7\ ( ^2 Xi:n 1 {^n<x [nai:n } + ^ x i-n(l{i,...,lna\}('i>) - 1 {X i .. n <x ina] .., 

= U^T \^Z Xil {^<X Vna] .. n } + X \ncc\:n^2 l (l{l,...,LnaJ}(«) - 1 {X i:n <X lnai:n }) j 
\i=l i=l / 

\ ( n n \ 

( 42 ) = j^J \^2 X i 1 {X i <X LMltn } + X lnai:n (^[na\ - J3l{x ( <x Ln(1 j !n })J • 

If we now had 



(4.3) lim X lnai:n = xf a \ , 

with probability 1, in connection with lim^^co nj [na\ = 1/a it would be plausible to obtain 



(4.1). Unfortunately (4.3) is not true in general, but only 



(4.4) liminf X\ na \. n = xr a \ and limsupXi nQ ,j. n = x^ a ' 



Nevertheless the proof could be completed on the base of (4.2) by using (4^4) together with the 
Glivenko-Cantelli theorem and Corollary |4.3j below. 



Proposition 4.1 validates the interpretation given to a-tail mean in |uj] as mean of the worst 
100a % cases. This concept, which seems very natural from an insurance or risk management 
point of view, has so far appeared in the literature by different kinds of conditional expectation 
beyond VaR which is a different concept for discrete distributions. "Tail Conditional expecta- 
tion" , "worst conditional expectation" , "conditional value at risk" all bear also in their name the 
fact that they are conditional expected values of the random variable X (note that concerning 
CVaR, by Corollary [4,3| below this is a misinterpretation). For TCE Q , for instance, the natural 



estimator is not given by the one analyzed in ( |4.1| ) or its negative, but rather by 
(4.5) 



E"=l X i 1 {X l <X lnaj:n } 



2_a=l ± {Xi<X^ na j :n } 

which however has problems of convergence in case X( a ) < x^ a \ 

This is the reason why we avoid the term "conditional" in our definition of a-tail mean. In fact, 
it is not very hard to see (cf. Example |5.4j below) that a-tail mean does not admit a general 
representation in terms of a conditional expectation of X given some event A E o~(X) (i.e. some 
event only depending on X). Hence it is not possible to give a definition of the type 

(4.6) x {a) = V[X\A] for some A € cr(X) , 
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unless the event A is chosen in a cr-algebra A D o~(X) on an artificial new probability space (see 
Corollary |6.2| below) . 



In order to make visible the coincidence of CVaR and tail mean, the following proposition collects 
some facts on quantiles which are well-known in probability theory (cf. Exercise 3 in ch. 1 of ]|] 
or Problem 25.9 of Jul] for the here cited version): 



Proposition 4.2 Let X be a real integrable random variable on some probability space (f2, A, P). 
Fix a € (0, 1) and define the function H a : R — > [0, oo) by 

(4.7) H a (s) = aE[(X - s) + ] + {l-a)E[{X - s)-}. 

Then the function H a is convex (and hence continuous) with lim H a (s) = oo. The set M a of 

\s\^oo 

minimizers to H a is a compact interval, namely 



(4.8) 



{s G R : P[X < s] < a < P[X < s]} . 



□ 



Note the following equivalent representations for H a : 
(4.9) 



H a (s) = aE[X] + a ( E[(X s) ] - .s 



a 



(4.10) 



' E[X1 {X < 3} ] a-P[X<s] 
a E[X\ — a | — h s 



a 



Q 



From Definitions 2^ and 2J3 for CVaR and ES, respectively, and by ( 4.10 ), in connection with 



Proposition 4.2, we obtain the following corollary to the proposition. 



Corollary 4.3 Let X be a real integrable random variable on some probability space (fi, A, P) 
and a G (0, 1) be fixed. Then 



ES Q (X) = CVaR Q pO 



(4.11) 



a- l {E[Xl {x < s} ]+s(a-P[X<s})), s£ 



"(a)> 



»1 



□ 



A further representation of ES or CVaR, respectively, as expectation of a suitably modified tail 
distribution is given in the recent research report ]l4j] (cf. Def. 3 therein). 



In Definitions and |2.6| only E[X ] < oo is required for X. Indeed, this integrability condition 
would suffice to guarantee ( 4.11| ). We formulated Corollary O with full integrability of X 



because we wanted to rely on Proposition 4.2 for the proof. 



Note that by a simple calculation one can show that (4.11) is equivalent to 

(4.12) ES a (V) = -a- 1 (E[Xl {x<s} ] + S (a-P[X< S ])), se[x {a) ,x^}. 

By ( [4.12[) we see that ES coincides with the coherent risk measure considered in Example 4 of 
H (already mentioned in Example 4.2 of ||). 
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5 Inequalities and counter-examples 



In this section we compare the Expected Shortfall with the risk measures TCE and WCE defined 
in section ||. Moreover, we present an example showing that VaR and TCE are not sub-additive 
in general. By the same example we show that there is not a clear relationship between WCE 
and lower TCE. We start with a result in the spirit of the Neyman-Pearson lemma. 



Proposition 5.1 Let a € (0, 1) be fixed and X be a real-valued random variable on some proba- 
bility space (O, A, P). Suppose that there is some function / : R — > R such that E[(/oX) _ ] < oo ; 
f( x ) < f(x( a )) f or x < x (a)> an d f( x ) — f( x (a)) f or x > x (a)- Let A G A be an event with 
V[A] > a and E[ \f o X\ 1 A ] < oo. Then 



(%) TM a (/oI) < E[/oX|A], 

fiijTM a (/oI) = E[/oX|i] ifP[An{X > x (a) }} =0 and 

(5.1) P[X<x (a) ]=0 or 

(5.2) P[X < x {a) ] > 0, P[Sl\A f){X < x {a) }] = 0, and P[A] = a , 

(Hi) if f(x) < /(a? (a )) for x < X( a ) and f(x) > /(a? (a) ) /or x > x^, then TM a (f o X) 



E[f o X | A] implies P[A D {X > x^}] = and either (\5^) or $5Jy. 



Proof. Note that by assumption 

{X < x (a) } C {foX < f(x (a) )} and {X < x (a) } D {f o X < f(x ia) )} 
Hence we see from ( |4.8|) that 

P[foX<f(x (a) )]>a and P[f o X < f(x {a) )} < a 

and therefore 

(5-3) q a {foX) < f(x (a) ) < q a (foX). 

Moreover, the assumption implies 

(5.4) {foX <f(x (a) )}\{X <x (a) } C {foX = f(x (a) )}. 
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By Corollary |4.3| , (|5.3|) , ( p.4|) , ( j3.6|) , and ( |3.7|) , we can calculate similarly to the proof of Propo- 



sition 3.4 



E[/ o X\A] - TM Q (/ o X) = e[/oI (p^Ia - a" 1 !^ 



E 



{/ox</(* (0) )};j 

(a) 



(a PL4])-* I /(x (Q) )E [a 1 A - PL4] lgL (a)} 



+ E 



(/oX-/(x (Q) )) al A -PL4]l 



L (a) 

{X<rr (a) } 



(a P^E [(/ o X - /(x (a) )) (a 1a - PL4] l (a) 



{X<z (a) } 



(5.5) > 0. 

Here, we obtain inequality (|5.5| ) from the assumption on / since 



(5.6) 



al A -P[A]l 



(a) I < , if X < x (a) 

{x<x (a) } 1 > o, if X > x (a) . 



This proves (i). The sufficiency and necessity respectively of the conditions in (ii) and (iii) for 
equality in (|5.5|) are easily obtained by careful inspection of (p^). □ 



Note that the condition 



(5.7) 



PLAn {X > x (a) }] = and P[Q\A n {X < x (a) }] = , 



appearing in (ii) and (iii) of Proposition 5A, means up to set differences of probability that 
(5.8) {X<x (a) } C A C {X<x {a) }. 



In particular, (5.7) is implied by (|5.8[). 



The proof of Proposition |5.1| is the hardest work in this section. Equipped with its result we are 
in a position to derive without effort a couple of conclusions pointing out the relations between 
TCE, WCE and ES. Recall ES„ = — TM„. 



Corollary 5.2 Let a £ (0, 1) and X a real-valued random variable on some probability space 
(Q,A,P) with E[X~] < oo. Then 



(5.9) 
(5.10) 



TCE°(X) < TCE a (X) < ES a pO, anc 
TCE°(X) < WCE Q (X) < ES a (X). 



Proof. The first inequality in ( |5.S| ) is obvious (formally it follows from Lemma 5.1 in JT^]). The 
second follows from Proposition [Tl] (i) by setting f(x) = x, A = {X < x^}, and observing 
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P[X < x/ a \\ > a. The first inequality in ( |5.10D was proven in Proposition 5.1 of ||. The 



second follows again from Proposition 5.1 (i) since all the events in the definition of WCE have 



probabilities > a. □ 



The following corollary to Proposition [hi presents in particular in (i) a first sufficient condition 



for WCE and ES to coincide, namely continuity of the distribution of X. 
Corollary 5.3 Let a and X be as in Corollary |5-4 Then 

(i) P[X < z?( a )] = a, P[X < x (a) ] > or P[X < x^ a \ X ^ x (a) ] =0 if and only if 
(5.11) ES a (X) = WCE a (X) = TCE a (X) = TCE a (X) . 



In particular, \5.11 ) holds if the distribution of X is continuous, i.e. P[X = x] = for all 
xeR. 

(ii) P[X < x (a) ] =a or P[X < x (a) ] =0 if and only if ES a (X) = TCE a (X). 
Proof. Concerning (i) apply Proposition |5.l| (ii) and (iii) with A = {X < } and Corollary 



5.2. In order to obtain (ii) apply Proposition |5.1| (ii) and (iii) with A = {X < x< a \}. □ 



Corollary 5.2 leaves open the relation between TCE a (X) and WCE a (X). The implication 

(5.12) P[X < x {a) ] > a => TCE a (X) < WCE a (X) 
is obvious. Corollary (ii) shows that 

(5.13) P[X<x (a) ] = a => TCE a (X) > WCE a (X) . 

The following example shows that all the inequalities between TCE, WCE, and ES in (|5.9| ), 
( 5.10| ), ( |5.12| ), and ( 5.13j ) can be strict. Moreover, it shows that none of the quantities — q a , 



VaR Q , TCE Q , or TCE" defines a sub-additive risk measure in general. 
Example 5.4 

Consider the probability space (f2, A, P) with 0, = {u)\, u>2, ^3}, A the set of all subsets of fi and 
P specified by 

P[{ui}]=P[{uj 2 }}= P , P[V 3 }] = l-2p, 

and choose < p < ^. Fix some positive number N and let Xi, i = 1,2, be two random variables 
defined on (Q, A, P) with values 



Xi(L0j) 



-N, i£i = j 
, otherwise. 



Choose a such that < a < 2 p. Then it is straightforward to obtain Table [l] with the values of 
the risk measures interesting to us. 
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p < a < 2p 


p 


= a 


p > a 


Risk Measure 




X x +X 2 


Xl,2 


Xi +X 2 


X\ y 2 


Xt + X 2 


-q a 





N 


N 


N 


N 


N 


VaR a 





N 





N 


N 


N 


TCE a 


Np 


N 


Np 


N 


N 


N 


TCE a 


Np 


N 


N 


N 


N 


N 


WCE Q 


N/2 


N 


N/2 


N 


N 


N 


ES Q 


Np/a 


N 


N 


N 


N 


N 



Table 1: Values of risk measures for Example |5.4| . 

In case p < a < 2p we see from Table [l] that 

-q a (Xl) ~ q a (X 2 ) < -q a {Xi + X 2 ) 

VaR°(Xi) + VaR Q (X 2 ) < YaR a (X 1 +X 2 ) 

TCE Q (Xi) + TCE a (X 2 ) < TCE a (X!+X 2 ) 

TCE a (Xi) + TCE Q (X 2 ) < TCE a (Xi + X 2 ) . 

These inequalities show that none of the notions — q a , VaR a , TCE a , or TCE a can be used to 
define a sub-additive risk measure. In case p < a < 2p we have also 



TCE Q (Xi 



(5.14) 



TCE^Xx) < ES a (Xi) 
TCE Q (Xi) < WCE Q (Xi) 
WCE Q (Xx) < ES a (Xi). 



Hence the second inequalities in (|5.9|), ( 5.10| ), and ( |5.12| ) may be strict, as can be the first 
inequality in ( |5.10D . In case p = a we have from Table [l] that 

TCE a (Xi) < TCE Q (Xi) and 
TCE a (Xi) > WCE Q (Xi). 

Thus, also the first inequality in ( |5.9| ) and the inequality in ( |5.13 ) can be strict. In particular, 
we see that there is not any clear relationship between TCE Q and WCE. Beside the inequalities, 
from the comparison with the results in the region p > a, we get an example for the fact that all 
the measures but ES may have discontinuities in a. Moreover, in case p < a we have a stronger 
version of ( |5.14 ), namely 

-inf{E[Xi \A] : A e A,P[A] > a} < ES Q (Xi) , 



which shows that even if one replaces ">" by ">" in Definition |2.4| , strict inequality may appear 
in the relation between WCE and ES. □ 



We finally observe that Example |5.4| is not so academic as it may seem at first glance since the 
X{S may be figured out as two risky bonds of nominal iV with non-overlapping default states 
uji of probability p. 
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6 Representing ES in terms of WCE 



By Example 5.4 we know that WCE and ES may differ in general. Nevertheless, we are going 
to show in the last part of the paper that this phenomenon can only occur when the underlying 
probability space is too "small" in the sense of not allowing a suitable representation of the 
random variable under consideration as function of a continuous random variable. Moreover, 
as long as only finitely many random variables are under consideration it is always possible to 
switch to a "larger" probability space in order to make WCE and ES coincide. Finally, we state 
a general representation of ES in terms of related WCEs. 



Proposition 6.1 Let X and Y be a real-valued random variables on a probability space (f2, A, P) 
such that E[Y~] < oo. Fix some a € (0, 1). Assume that Y is given byY = foX where f satisfies 
f{x) < /(«(«)) for x < and f(x) > f(x {a) ) for x > x (ct) . 



(i) IfP[X < x/ a \] = a then 



ES a (Y) = - inf E[y|A|. 

AeA, P[A]>a 



(ii) If the distribution function of X is continuous then also 

ES a (Y) = WCE a (Y) 



Proof. Concerning (i) , by Proposition |5.1| (i) we only have to show 
(6.1) TM Q (Y) = E[Y\X<x {a) }. 

With the choice A = {X < xt a \} this follows from Proposition |5.l| (ii). 

Concerning Proposition [Tl] (ii), by we have to show that there is a sequence (^4 n ) nG N in 

A with P[^4 n ] > a for all n G N such that 

lim E[Y | A n ] = E\Y\X < x (a) ] . 

n—>oo ' • ' 

By continuity of the distribution of X and integrability of Y~ we obtain such a sequence with 
the definition A n = {X < x^ + l/n}. □ 



Corollary 6.2 Let {X\, . . . , X^) be an W 1 -valued random vector on a probability space (Q, A, P) 
such that E[X~] < oo, i = 1, . . . ,d. Fix a € (0, 1). Then there is a random vector (X[, . . . , X' d ) 
on some probability space (Q ,A',P') with the following two properties: 

(i) The distributions of (Xi, . . . ,Xd) and (X[, . . . ,X' d ) are equal, i.e. 

P[Xi < x h ... ,X d < x d ] = P'[X[ < Xl ,...,X' d < x d ] for all (x u . .. ,x d ) £ R. 
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(ii) Worst conditional expectation and Expected Shortfall coincide for all i = 1, . . . , d, i.e. 
WCE Q (X J / ) = ESapCf), » = 1,... ,d. □ 



Proof. By Sklar's theorem (cf. Theorem 2.10.9 in |12J) we get the existence of a random vector 
(Ui, . . . ,Ud) where each Ui is uniformly distributed on (0,1) such that (i) holds with X[ = 
qjj i {Xi)^% = 1, . . . , d. Since q a is non-decreasing in a the assertion now follows from Proposition 

o. □ 



Corollary p.2| yields another proof for the sub-additivity of Expected Shortfall: in order to prove 
ESQ.(X)+ESQ,(y) > ES a (X + Y) apply the corollary to the underlying random vector (X, Y,X + 
Y)- 

As a final consequence of Corollary |5.2| and Corollary |6.2| we note: 

Corollary 6.3 Let X be a real-valued random variable on some probability space (fl, A, P) with 
E[X~] < oo. Fix a G (0, 1). Then 

ES a (X) = max |wCE Q ,(X / ) : X' random variable on (fi',^',P') with 

P'[X' <x}= P[X < x] for all x G R j, 

where the maximum is taken over all random variables X' on probability spaces (0', A', P') such 
that the distributions of X and X' are equal. □ 



Corollary 6.3 shows that Expected Shortfall in the sense of Definition 2.6 may be considered a 



robust version of worst conditional expectation (Definition 2.4), making the latter insensitive to 
the underlying probability space. 



7 Conclusion 

In the paper at hand we have shown that simply taking a conditional expectation of losses 
beyond VaR can fail to yield a coherent risk measure when there are discontinuities in the loss 
distributions. Already existing definitions for some kind of expected shortfall, redressing this 
drawback, as those in || or [13], did not provide representations suitable for efficient computation 



and estimation in the general case. We have clarified the relations between these definitions and 
the explicit one from p], thereby pointing out that it is the definition which is most appropriate 
for practical purposes. 
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A Appendix: Subadditivity of Expected Shortfall 



We give here for the sake of completeness the proof of subadditivity for expected shortfall which 
was originally given in Appendix A of M. 



For the proof it is convenient to adopt the representation of eq. ( |3.7|) for the Tail Mean and 
write the Expected Shortfall as 



(A.l) 



1 



ES a (Y) = --E[Xl^ X( J 



with the function lK< s \ defined in ( |3.4[ ). 

Proposition A.l (Subadditivity of Expected Shortfall) Given two random variables X 
and Y with E[X~] < oo and E[Y~] < oo the following inequality holds: 



(A.2) 

for any a £ (0, 1] 



ES a (X + Y) < ES a (X) +ES a (Y) 



Proof. Defining Z = X + Y, we obtain by virtue of (3.6) 
(A.3) a(ES a (X) +ES a (Y) -ES a (Z)) = 



E 



Z 1 



(a) 

{Z<z (a) } 



X 1 



(a) 



Y 1 



(a) 



{X<x (a) } {Y<V( a) } 



E 



X 



V {Z<z {a) } 



-i (a) \ , ( a ) _ 1 ( a ) 



> x (a) E 



I (a) 



l(«) 



{Z<* (a) } {X<x (a) } 



+ J/(a) E 



.(«) 



I (a) 



{z<2 (a) } {y<y (a) } 



= X( Q ) (a — a) + (a — a) = 
which proves the thesis. In the inequality above we used the fact that 



(A.4) 



{Z<Z(a)} {X<X( a) } 



> if X > x 



(a) 



I (a) 



I (a) 



{Z<z {a) } {X<x (a) } 

which in turn is a consequence of fl3.4j ) and ( jsTo] 



< if X < x 



(a) 



□ 
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